Skip to main content

Near-optimal vector quantization — compress embeddings to 3-8 bits with provably unbiased inner products. No training needed. 58 integrations.

Project description

BitPolar

BitPolar

Near-optimal vector quantization with zero training overhead

Crates.io docs.rs CI License


Compress embeddings to 3-8 bits with provably unbiased inner products and no calibration data. Implements TurboQuant (ICLR 2026), PolarQuant (AISTATS 2026), and QJL (AAAI 2025) from Google Research.

Key Properties

  • Data-oblivious — no training, no codebooks, no calibration data
  • Deterministic — fully defined by 4 integers: (dimension, bits, projections, seed)
  • Provably unbiased — inner product estimates satisfy E[estimate] = exact at 3+ bits
  • Near-optimal — distortion within ~2.7x of the Shannon rate-distortion limit
  • Instant indexing — vectors compress on arrival, 600x faster than Product Quantization

What's New in 0.3.x

  • 58 integrations — every major AI framework, vector database, and ML library

  • PyTorch torchao — embedding quantizer, BitPolarLinear, KV cache

  • FAISS drop-in — API-compatible IndexBitPolarIP/L2 replacement

  • LlamaIndex, Haystack, DSPy — VectorStore and Retriever integrations

  • Agentic AI — LangGraph, CrewAI, OpenAI Agents, Google ADK, SmolAgents, PydanticAI

  • Agent memory — Mem0, Zep, Letta backends

  • 11 vector databases — Milvus, Weaviate, Pinecone, Redis, ES, DuckDB, SQLite, and more

  • LLM inference — llama.cpp, SGLang, TensorRT, Ollama, MLX KV cache compression

  • ML frameworks — JAX/Flax, TensorFlow/Keras, scikit-learn pipeline

  • 30 Python examples covering all integrations

  • Walsh-Hadamard Transform — O(d log d) rotation with O(d) memory (577x less than Haar QR)

  • Python bindings — PyO3 + maturin, zero-copy numpy integration

  • WASM bindings — browser-side vector search via wasm-bindgen

  • no_std support — embedded/edge deployment with alloc feature

Quick Start

Rust

[dependencies]
bitpolar = "0.3"
use bitpolar::TurboQuantizer;
use bitpolar::traits::VectorQuantizer;

// Create quantizer from 4 integers — no training needed
let q = TurboQuantizer::new(128, 4, 32, 42).unwrap();

// Encode a vector
let vector = vec![0.1_f32; 128];
let code = q.encode(&vector).unwrap();

// Estimate inner product without decompression
let query = vec![0.05_f32; 128];
let score = q.inner_product_estimate(&code, &query).unwrap();

// Decode back to approximate vector
let reconstructed = q.decode(&code);

Python

pip install bitpolar
import numpy as np
import bitpolar

# Create quantizer — no training needed
q = bitpolar.TurboQuantizer(dim=768, bits=4, projections=192, seed=42)

# Encode/decode
embedding = np.random.randn(768).astype(np.float32)
code = q.encode(embedding)
decoded = q.decode(code)

# Build a search index
index = bitpolar.VectorIndex(dim=768, bits=4)
for i, vec in enumerate(embeddings):
    index.add(i, vec)

ids, scores = index.search(query, top_k=10)

JavaScript (WASM)

import init, { WasmQuantizer, WasmVectorIndex } from 'bitpolar-wasm';

await init();

const q = new WasmQuantizer(128, 4, 32, 42n);
const code = q.encode(new Float32Array(128).fill(0.1));
const decoded = q.decode(code);

const index = new WasmVectorIndex(128, 4, 32, 42n);
index.add(0, vector);
const results = index.search(query, 5);

Walsh-Hadamard Transform

The WHT provides an O(d log d) alternative to Haar QR rotation:

Property Haar QR (0.1.x) Walsh-Hadamard (0.2.x+)
Time complexity O(d²) O(d log d)
Memory O(d²) — 2.3 MB @ d=768 O(d) — 4 KB @ d=768
Quality Exact Haar distribution Near-Haar (JL guarantees)
Deterministic Yes (seed-based) Yes (seed-based)
use bitpolar::wht::WhtRotation;
use bitpolar::traits::RotationStrategy;

let wht = WhtRotation::new(768, 42).unwrap();
let rotated = wht.rotate(&embedding);
let recovered = wht.rotate_inverse(&rotated);

API Overview

Core Quantizers

Type Description Use Case
TurboQuantizer Two-stage (Polar + QJL) Primary API — best quality
PolarQuantizer Polar coordinate encoding Simpler, fallback option
QjlQuantizer 1-bit JL sketching Residual correction
WhtRotation Walsh-Hadamard rotation Fast, memory-efficient rotation

Specialized Wrappers

Type Description
KvCacheCompressor Transformer KV cache compression
MultiHeadKvCache Multi-head attention KV cache
TieredQuantization Hot (8-bit) / Warm (4-bit) / Cold (3-bit)
ResilientQuantizer Primary + fallback for production robustness
OversampledSearch Two-phase approximate + exact re-ranking
DistortionTracker Online quality monitoring (EMA MSE/bias)

Language Bindings

Package Install Language
bitpolar cargo add bitpolar Rust
bitpolar pip install bitpolar Python (PyO3)
@mmgehlot/bitpolar-wasm npm install @mmgehlot/bitpolar-wasm JavaScript (WASM)
@mmgehlot/bitpolar npm install @mmgehlot/bitpolar Node.js (NAPI-RS)
bitpolar-go go get github.com/mmgehlot/bitpolar/... Go (CGO)
bitpolar Maven Central Java (JNI)
bitpolar-pg cargo pgrx install PostgreSQL

58 Integrations — Every Major AI Framework

BitPolar is the single canonical library for vector quantization across the entire AI/ML ecosystem.

RAG & Search Frameworks

Integration Package Description
LangChain langchain_bitpolar VectorStore with compressed similarity search
LlamaIndex llamaindex_bitpolar BasePydanticVectorStore for LlamaIndex
Haystack bitpolar_haystack DocumentStore + Retriever component
DSPy bitpolar_dspy Retriever module for DSPy pipelines
FAISS bitpolar_faiss Drop-in replacement for faiss.IndexFlatIP/L2
ChromaDB bitpolar_chroma EmbeddingFunction + two-phase search store

Agentic AI Frameworks

Integration Package Description
LangGraph bitpolar_langgraph Compressed checkpoint saver for stateful agents
CrewAI bitpolar_crewai Memory backend for agent teams
OpenAI Agents SDK bitpolar_openai_agents Function-calling tools for OpenAI agents
Google ADK bitpolar_google_adk Tool for Google Agent Development Kit
Anthropic MCP bitpolar_anthropic MCP server (stdio + SSE) for Claude
AutoGen bitpolar_autogen Memory store for Microsoft agents
SmolAgents bitpolar_smolagents HuggingFace agent tool
PydanticAI bitpolar_pydantic_ai Type-safe Pydantic tool definitions
Agno (Phidata) bitpolar_agno Knowledge base for high-perf agents

Agent Memory Frameworks

Integration Package Description
Mem0 bitpolar_mem0 Vector store backend for Mem0
Zep bitpolar_zep Compressed store with time-decay scoring
Letta (MemGPT) bitpolar_letta Archival memory tier

Vector Databases

Integration Package Description
Qdrant bitpolar_embeddings.qdrant Two-phase HNSW + BitPolar re-ranking
Milvus bitpolar_milvus Client-side compression with reranking
Weaviate bitpolar_weaviate Client-side compression with reranking
Pinecone bitpolar_pinecone Metadata-stored compressed codes
Redis bitpolar_redis Byte string storage with pipeline search
Elasticsearch bitpolar_elasticsearch kNN search + BitPolar reranking
PostgreSQL bitpolar-pg Native pgrx extension (SQL functions)
DuckDB bitpolar_duckdb BLOB storage with SQL queries
SQLite bitpolar_sqlite_vec Zero-dependency embedded vector search
Supabase bitpolar_supabase Serverless pgvector compression
Neon bitpolar_neon Serverless Postgres driver

LLM Inference Engines (KV Cache)

Integration Package Description
vLLM bitpolar_vllm KV cache quantizer + DynamicCache
HuggingFace Transformers bitpolar_transformers Drop-in DynamicCache replacement
llama.cpp bitpolar_llamacpp KV cache compression
SGLang bitpolar_sglang RadixAttention cache compression
TensorRT-LLM bitpolar_tensorrt KV cache quantizer plugin
Ollama bitpolar_ollama Embedding compression client
ONNX Runtime bitpolar_onnx Model embedding quantizer
Apple MLX bitpolar_mlx Apple Silicon quantizer

ML Frameworks

Integration Package Description
PyTorch bitpolar_torch Embedding quantizer, BitPolarLinear, KV cache
PyTorch (native) bitpolar_torch_native PT2E quantizer backend
JAX/Flax bitpolar_jax JAX array compression + Flax module
TensorFlow bitpolar_tensorflow Keras layers for compression
scikit-learn bitpolar_sklearn TransformerMixin for sklearn pipelines

Cloud & Enterprise

Integration Package Description
Spring AI BitPolarVectorStore.java Java VectorStore for Spring Boot
Vercel AI SDK bitpolar_vercel Embedding compression middleware
AWS Bedrock bitpolar_bedrock Titan/Cohere embedding compression
Triton bitpolar_triton NVIDIA Inference Server backend
gRPC bitpolar-server Language-agnostic compression service
MCP bitpolar_mcp AI coding assistant tool server
CLI bitpolar-cli Command-line compress/search/bench

How It Works

Input f32 vector
    │
    ▼
┌─────────────────┐
│ Random Rotation  │  WHT (O(d log d)) or Haar QR (O(d²))
│                  │  Spreads energy uniformly across coordinates
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│   PolarQuant     │  Groups d dims into d/2 pairs → polar coords
│   (Stage 1)      │  Radii: lossless f32 │ Angles: b-bit quantized
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  QJL Residual    │  Sketches reconstruction error
│   (Stage 2)      │  1 sign bit per projection → unbiased correction
└────────┬────────┘
         │
         ▼
TurboCode { polar: PolarCode, residual: QjlSketch }

Inner product estimation: ⟨v, q⟩ ≈ IP_polar(code, q) + IP_qjl(sketch, q)

Parameter Selection

Use Case Bits Projections Notes
Semantic search 4-8 dim/4 Best accuracy for retrieval
KV cache 3-6 dim/8 Memory vs attention quality
Maximum compression 3 dim/16 Still provably unbiased
Lightweight similarity dim/4 QJL standalone (1-bit sketches)

Feature Flags

Feature Default Description
std Yes Standard library (nalgebra QR, full rotation)
alloc No Heap allocation without std (Vec via alloc crate)
serde-support Yes Serde serialization for all types
simd No Hand-tuned NEON/AVX2 kernels
parallel No Parallel batch operations via rayon
tracing-support No OpenTelemetry-compatible instrumentation
ffi No C FFI exports for cross-language bindings

no_std Support

BitPolar works on embedded/edge targets with no_std:

[dependencies]
bitpolar = { version = "0.3", default-features = false, features = ["alloc"] }

Uses libm for math functions and alloc for Vec/String. The Walsh-Hadamard rotation is available without std (unlike Haar QR which requires nalgebra).

Traits

BitPolar exposes composable traits for ecosystem integration:

  • VectorQuantizer — core encode/decode/IP/L2 interface
  • BatchQuantizer — parallel batch operations (behind parallel feature)
  • RotationStrategy — pluggable rotation (QR, Walsh-Hadamard, identity)
  • SerializableCode — compact binary serialization

Examples

30 Python examples + 9 Rust examples + JavaScript, Go, Java examples.

# Rust
cargo run --example search_vector_database
cargo run --example llm_kv_cache

# Python (30 examples covering all 58 integrations)
python examples/python/01_quickstart.py           # Core API
python examples/python/12_pytorch_quantizer.py     # PyTorch integration
python examples/python/13_llamaindex_vectorstore.py # LlamaIndex
python examples/python/14_faiss_dropin.py          # FAISS replacement
python examples/python/18_openai_agents_tool.py    # OpenAI Agents
python examples/python/23_vector_databases.py      # DuckDB, SQLite, etc.
python examples/python/30_complete_rag.py          # End-to-end RAG pipeline

See examples/README.md for the full list.

Performance

Run benchmarks:

cargo bench

References

Contributing

Contributions are welcome! See CONTRIBUTING.md for development setup, coding standards, and how to add a new quantization strategy.

License

Licensed under either of:

at your option.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bitpolar-0.3.3.tar.gz (1.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bitpolar-0.3.3-cp39-abi3-manylinux_2_34_x86_64.whl (312.4 kB view details)

Uploaded CPython 3.9+manylinux: glibc 2.34+ x86-64

File details

Details for the file bitpolar-0.3.3.tar.gz.

File metadata

  • Download URL: bitpolar-0.3.3.tar.gz
  • Upload date:
  • Size: 1.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.12.6

File hashes

Hashes for bitpolar-0.3.3.tar.gz
Algorithm Hash digest
SHA256 053b5355b17e2aeacd532960e32c5832d9034dbd6364550c01779aa295422781
MD5 0aa3a102b296639608fb147d1efb177d
BLAKE2b-256 cc73be01c5c52a9131bcd348db01d0c53ac93dd02e4dd120ca8b3b4340a3cd74

See more details on using hashes here.

File details

Details for the file bitpolar-0.3.3-cp39-abi3-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for bitpolar-0.3.3-cp39-abi3-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 3f9a24a96cec3fa0fca9d66fc9c46b437c2821ef01b9f7c5d620a5e76cea7e74
MD5 eebbed6ddc1bbef15c314c65712198ac
BLAKE2b-256 d9f9d75580e2bb7c0e138352476ca3e51346e823b8afe7f3404e6cf2604cd6f0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page